AITopics | Tunis Governorate

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > Haiti (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(38 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Programming in Assembly Is Brutal, Beautiful, and Maybe Even a Path to Better AI

WIREDOct-13-2025, 11:00:00 GMT

Whether your chip is running a vintage computer game or the latest DeepSeek model, it'll reward you for speaking its native language. But if you took a look beneath the pixels--the rickety rides, the crowds of hungry, thirsty, barfing people (and the janitors mopping in their wake)--deep down at the level of the code, you saw craftsmanship so obsessive that it bordered on insane. Chris Sawyer, the game's sole developer, wrote the whole thing in assembly. Because if/when the machines take over, we should at least speak their language. Certain programming languages, like Python or Go or C++, are called "high-level" because they work sort of like human language, written in commands and idioms that might fit in at a poetry slam.

assembly, programming, sawyer, (16 more...)

WIRED

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
Europe > United Kingdom > Scotland (0.05)
Europe > Slovakia (0.05)
(5 more...)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Add feedback

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making Y ubin Kim

Neural Information Processing SystemsOct-10-2025, 09:38:04 GMT

Foundation models are becoming valuable tools in medicine.

agent, complexity, dataset, (15 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Taiwan (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.45)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(9 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

74bb24dca8334adce292883b4b651eda-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 22:13:22 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Haiti (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(38 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(2 more...)

Add feedback

Global Sumud Flotilla reports drone attack on Gaza-bound ship in Tunisia

Al JazeeraSep-9-2025, 06:56:16 GMT

How dangerous is the situation in the West Bank? What does survival look like inside Gaza City? The Gaza-bound Global Sumud Flotilla (GSF) says a drone has struck its main ship in the Tunisian port of Sidi Bou Said, causing a fire, but that all its passengers and crew were safe. A spokesman for the GSF blamed Israel for the incident, which occurred late on Monday, but the Tunisian National Guard said reports of a drone attack were "completely unfounded". The GSF, however, insisted the incident was a drone attack and said it would provide more details on Tuesday morning.

artificial intelligence, drone attack, social media, (14 more...)

Al Jazeera

Country:

Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (1.00)
Asia > Middle East > Israel (0.59)
South America (0.05)
(8 more...)

Industry: Government > Military > Navy (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.95)

Add feedback

Working Document -- Formalising Software Requirements with Large Language Models

Beg, Arshad, O'Donoghue, Diarmuid, Monahan, Rosemary

arXiv.org Artificial IntelligenceJun-24-2025

This draft is a working document, having a summary of nighty-four (94) papers with additional sections on Traceability of Software Requirements (Section 4), Formal Methods and Its Tools (Section 5), Unifying Theories of Programming (UTP) and Theory of Institutions (Section 6). Please refer to abstract of [7,8]. Key difference of this draft from our recently anticipated ones with similar titles, i.e. AACS 2025 [7] and SAIV 2025 [8] is: [7] is a two page submission to ADAPT Annual Conference, Ireland. Submitted on 18th of March, 2025, it went through the light-weight blind review and accepted for poster presentation. Conference was held on 15th of May, 2025; [8] is a nine page paper with additional nine pages of references and summary tables, submitted to Symposium on AI Verification (SAIV 2025) on 24th of April, 2025. It went through rigorous review process. The uploaded version on arXiv.org [8] is the improved one of the submission, after addressing the specific suggestions to improve the paper.

large language model, machine learning, specification, (19 more...)

arXiv.org Artificial Intelligence

2506.14627

Country:

Europe > Ireland (0.24)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
(2 more...)

Add feedback

Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models

Amari, Aziz, Ammar, Mohamed Achref Ben

arXiv.org Artificial IntelligenceJun-24-2025

The rapid expansion of information from diverse sources has heightened the need for effective automatic text summarization, which condenses documents into shorter, coherent texts. Summarization methods generally fall into two categories: extractive, which selects key segments from the original text, and abstractive, which generates summaries by rephrasing the content coherently. Large language models have advanced the field of abstractive summarization, but they are resource-intensive and face significant challenges in retaining key information across lengthy documents, which we call being "lost in the middle". To address these issues, we propose a hybrid summarization approach that combines extractive and abstractive techniques. Our method splits the document into smaller text chunks, clusters their vector embeddings, generates a summary for each cluster that represents a key idea in the document, and constructs the final summary by relying on a Markov chain graph when selecting the semantic order of ideas.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-96231-8_29

2506.18036

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)

Genre:

Overview (0.69)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition

Hamad, Nagham, Khalilia, Mohammed, Jarrar, Mustafa

arXiv.org Artificial IntelligenceJun-17-2025

We introduce Konooz, a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types using both nested and flat annotation schemes - using the Wojood guidelines. While Konooz is useful for various NLP tasks like domain adaptation and transfer learning, this paper primarily focuses on benchmarking existing Arabic Named Entity Recognition (NER) models, especially cross-domain and cross-dialect model performance. Our benchmarking of four Arabic NER models using Konooz reveals a significant drop in performance of up to 38% when compared to the in-distribution data. Furthermore, we present an in-depth analysis of domain and dialect divergence and the impact of resource scarcity. We also measured the overlap between domains and dialects using the Maximum Mean Discrepancy (MMD) metric, and illustrated why certain NER models perform better on specific dialects and domains. Konooz is open-source and publicly available at https://sina.birzeit.edu/wojood/#download

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2506.12615

Country:

Africa > Sudan (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
(25 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Formalising Software Requirements using Large Language Models

Beg, Arshad, O'Donoghue, Diarmuid, Monahan, Rosemary

arXiv.org Artificial IntelligenceJun-13-2025

This paper is a brief introduction to our recently initiated project named VERIFAI: Traceability and verification of natural language requirements. The project addresses the challenges in the traceability and verification of formal specifications through providing support for the automatic generation of the formal specifications and the traceability of the requirements from the initial software design stage through the systems implementation and verification. Approaches explored in this project include Natural Language Processing, use of ontologies to describe the software system domain, reuse of existing software artefacts from similar systems (i.e. through similarity based reuse) and large language models to identify and declare the specifications as well as use of artificial intelligence to guide the process.

large language model, logic & formal reasoning, specification, (18 more...)

arXiv.org Artificial Intelligence

2506.10704

Country:

Europe > Ireland (0.05)
Europe > Switzerland (0.04)
Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback